Recognition of the Script in Serbian Documents Using Frequency Occurrence and Co-Occurrence Analysis
نویسندگان
چکیده
Any document in Serbian language can be written in two different scripts: Latin or Cyrillic. Although characteristics of these scripts are similar, some of their statistical measures are quite different. The paper proposed a method for the extraction of certain script from document according to the occurrence and co-occurrence of the script types. First, each letter is modeled with the certain script type according to characteristics concerning its position in baseline area. Then, the frequency analysis of the script types occurrence is performed. Due to diversity of Latin and Cyrillic script, the occurrence of modeled letters shows substantial statistics dissimilarity. Furthermore, the co-occurrence matrix is computed. The analysis of the co-occurrence matrix draws a strong margin as a criteria to distinguish and recognize the certain script. The proposed method is analyzed on the case of a database which includes different types of printed and web documents. The experiments gave encouraging results.
منابع مشابه
Global Approach for Script Identification using Wavelet Packet Based Features
In a multi script environment, an archive of documents having the text regions printed in different scripts is in practice. For automatic processing of such documents through Optical Character Recognition (OCR), it is necessary to identify different script regions of the document. In this paper, a novel texture-based approach is presented to identify the script type of the collection of documen...
متن کاملWavelet Packet Based Texture Features for Automatic Script Identification
In a multi script environment, an archive of documents printed in different scripts is in practice. For automatic processing of such documents through Optical Character Recognition (OCR), it is necessary to identify the script type of the document. In this paper, a novel texture-based approach is presented to identify the script type of the collection of documents printed in ten Indian scripts ...
متن کاملMapping the Scientific Structure of Iranian Brucellosis Researches Using the Co-authorship and Co-occurrence Network Analysis
Background and Objective: The evaluation of the publishing trend of articles in various scientific fields provides an insight into the efforts of researchers in the field of knowledge. Accordingly, the present study has evaluated and analyzed the scientific publications on brucellosis conducted by Iranian researchers using scientometrics methods and analysis of social networks. Methods: The pr...
متن کاملDrawing Word co-occurrence map of Spinal Muscular Atrophy disease
Introduction: The purpose of this article is to evaluate the status of articles in the field of Spinal Muscular Atrophy According to the Scientometrics indices Word co-occurrence map of this field . Methods: The present study is an applied one with a quantitative approach and a descriptive approach. It has been done using scientometrics and the co-occurrence words analysis technique. Document...
متن کاملThe analysis of co-citation and word co-occurrence networks of Iranian articles in the field of dentistry
Background and Aims: Dentistry is an important profession ensuring the health of body and soul, and has a special place in the scientific productions of medical disciplines. The purpose of this study was to analyze the co-citation and word co-occurrence of Iranian research papers in the field of dentistry based on indexed documents in Web of Science from 2014 to 2018. Materials and Methods:...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 2013 شماره
صفحات -
تاریخ انتشار 2013